16 research outputs found

    Learning Reward Machines in Cooperative Multi-Agent Tasks

    Full text link
    This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL) that combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks. The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments and improves the interpretability of the learnt policies required to complete the cooperative task. The RMs associated with each sub-task are learnt in a decentralised manner and then used to guide the behaviour of each agent. By doing so, the complexity of a cooperative multi-agent problem is reduced, allowing for more effective learning. The results suggest that our approach is a promising direction for future research in MARL, especially in complex environments with large state spaces and multiple agents.Comment: Neuro-symbolic AI for Agent and Multi-Agent Systems Workshop at AAMAS'2

    Population-Based Reinforcement Learning for Combinatorial Optimization

    Full text link
    Applying reinforcement learning (RL) to combinatorial optimization problems is attractive as it removes the need for expert knowledge or pre-solved instances. However, it is unrealistic to expect an agent to solve these (often NP-)hard problems in a single shot at inference due to their inherent complexity. Thus, leading approaches often implement additional search strategies, from stochastic sampling and beam-search to explicit fine-tuning. In this paper, we argue for the benefits of learning a population of complementary policies, which can be simultaneously rolled out at inference. To this end, we introduce Poppy, a simple theoretically grounded training procedure for populations. Instead of relying on a predefined or hand-crafted notion of diversity, Poppy induces an unsupervised specialization targeted solely at maximizing the performance of the population. We show that Poppy produces a set of complementary policies, and obtains state-of-the-art RL results on three popular NP-hard problems: the traveling salesman (TSP), the capacitated vehicle routing (CVRP), and 0-1 knapsack (KP) problems. On TSP specifically, Poppy outperforms the previous state-of-the-art, dividing the optimality gap by 5 while reducing the inference time by more than an order of magnitude

    Induction of Subgoal Automata for Reinforcement Learning

    Full text link
    In this work we present ISA, a novel approach for learning and exploiting subgoals in reinforcement learning (RL). Our method relies on inducing an automaton whose transitions are subgoals expressed as propositional formulas over a set of observable events. A state-of-the-art inductive logic programming system is used to learn the automaton from observation traces perceived by the RL agent. The reinforcement learning and automaton learning processes are interleaved: a new refined automaton is learned whenever the RL agent generates a trace not recognized by the current automaton. We evaluate ISA in several gridworld problems and show that it performs similarly to a method for which automata are given in advance. We also show that the learned automata can be exploited to speed up convergence through reward shaping and transfer learning across multiple tasks. Finally, we analyze the running time and the number of traces that ISA needs to learn an automata, and the impact that the number of observable events has on the learner's performance.Comment: Preprint accepted for publication to the 34th AAAI Conference on Artificial Intelligence (AAAI-20

    Collective adaptation through concurrent planning: the case of sustainable urban mobility

    Get PDF
    In this paper we address the challenges that impede collective adaptation in smart mobility systems by proposing a notion of ensembles. Ensembles enable systems with collective adaptability to be built as emergent aggregations of autonomous and self-adaptive agents. Adaptation in these systems is triggered by a run-time occurrence, which is known as an issue. The novel aspect of our approach is, it allows agents affected by an issue in the context of a smart mobility scenario to adapt collaboratively with minimal impact on their own preferences through an issue resolution process based on concurrent planning algorithms

    Learning and Generalization in Atari Games

    No full text
    Treball de fi de grau en inform脿ticaTutor: Anders JonssonThis thesis describes the design of agents that learn to play Atari games using the Arcade Learning Environment (ALE) framework to interact with them. The application of machine learning in video games, given its high complexity, is considered to be a bridge towards real-world domains such as robotics. The goal in Atari games is to achieve the highest possible score. To solve this task, reinforcement learning and search techniques are used. These algorithms outperform humans in 30 of the 61 games supported by ALE. Since humans are very good at making generalizations between games, special emphasis is/ngiven to evaluating how well an agent learns from multiple games simultaneously. These experiments usually result in a higher score for specific pairs of games. Besides, there are games that tend to increase their score when playing with other games, whereas there are games that help others to perform better.Aquesta tesis descriu el disseny d'agents que aprenen a jugar a jocs d'Atari utilitzant el/nframework Arcade Learning Environment (ALE) per a interactuar amb ells. L'aplicaci贸/nd'aprenentatge autom脿tic en videojocs, donada la seva alta complexitat, es considera un/npont cap a dominis com la rob貌tica./nL'objectiu als jocs d'Atari 茅s aconseguir la major puntuaci贸 possible. Per a resoldre aquesta/ntasca, s'utilitzen t猫cniques d'aprenentatge per refor莽 i cerca. Aquests algoritmes superen/nals humans en 30 dels 61 jocs suportats per ALE./nCom els humans s贸n molt bons fent generalitzacions entre jocs, es fa especial 猫mfasi en/navaluar com un agent pot aprendre de m煤ltiples jocs jugats simult脿niament. Aquests/nexperiments solen resultar en una major puntuaci贸 per a parelles espec铆fiques de jocs. A/nm茅s, hi ha jocs que tendeixen a incrementar la seva puntuaci贸 quan juguen amb altres,/nmentre que tamb茅 hi ha jocs que ajuden a altres a actuar millor.Esta tesis describe el diseno de agentes que aprenden a jugar a juegos de Atari usando el/nframework Arcade Learning Environment (ALE) para interactuar con ellos. La aplicaci贸n/nde aprendizaje autom谩tico en videojuegos, dada su alta complejidad, se considera un/npuente hacia dominios como la rob贸tica./nEl objetivo en los juegos de Atari es conseguir la mayor puntuaci贸n posible. Para resolver/nesta tarea, se utilizan t茅cnicas de aprendizaje por refuerzo y b煤squeda. Estos algoritmos/nsuperan a los humanos en 30 de los 61 juegos soportados por ALE./nComo los humanos son muy buenos haciendo generalizaciones entre juegos, se hace especial 茅nfasis en evaluar c贸mo un agente puede aprender de m煤ltiples juegos jugados simult谩neamente. Estos experimentos suelen resultar en una mayor puntuaci贸n para pares/nespec铆ficos de juegos. Adem谩s, hay juegos que tienden a incrementar su puntuaci贸n cuando/njuegan con otros, mientras que tambi茅n hay juegos que ayudan a otros a actuar mejor

    Resolution of concurrent planning problems using classical planning

    No full text
    Tutor: Anders JonssonTreball fi de m脿ster de: Master in Intelligent Interactive SystemsIn this work, we present new approaches for solving multiagent planning and temporal planning problems. These planning forms are two types of concurrent planning, where actions occur in parallel. The methods we propose rely on a compilation to classical planning problems that can be solved using an off-the-shelf classical planner. Then, the solutions can be converted back into multiagent or temporal solutions. Our compilation for multiagent planning is able to generate concurrent actions that satisfy a set of concurrency constraints. Furthermore, it avoids the exponential blowup associated with concurrent actions, a problem that many multiagent planners are facing nowadays. Incorporating similar ideas in temporal planning enables us to generate temporal plans with simultaneous events, which most state-of-the-art temporal planners cannot do. In experiments, we compare our approaches to other approaches. We show that the methods using transformations to classical planning are able to get better results than state-of-the-art approaches for complex problems. In contrast, we also highlight some of the drawbacks that this kind of methods have for both multiagent and temporal planning. We also illustrate how these methods can be applied to real world domains like the smart mobility domain. In this domain, a group of vehicles and passengers must self-adapt in order to reach their target positions. The adaptation process consists in running a concurrent planning algorithm. The behavior of the approach is then evaluated

    Solving multiagent planning problems with concurrent conditional effects

    No full text
    Comunicaci贸 presentada al 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, celebrat del 27 de gener a l'1 de febrer de 2019 a Palo Alta, EEUU.In this work we present a novel approach to solving concurrent multiagent planning problems in which several agents act in parallel. Our approach relies on a compilation from concurrent multiagent planning to classical planning, allowing us to use an off-the-shelf classical planner to solve the original multiagent problem. The solution can be directly interpreted as a concurrent plan that satisfies a given set of concurrency constraints, while avoiding the exponential blowup associated with concurrent actions. Our planner is the first to handle action effects that are conditional on what other agents are doing. Theoretically, we show that the compilation is sound and complete. Empirically, we show that our compilation can solve challenging multiagent planning problems that require concurrent actions.This work has been supported by the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502). Anders Jonsson is partially supported by the grants TIN2015-67959 and PCIN-2017-082 of the Spanish Ministry of Science

    Solving concurrent multiagent planning using classical planning

    No full text
    Comunicaci贸 presentada al 6th Workshop on Distributed and Multi-Agent Planning (DMAP 2018), celebrat durant la 28th International Conference on Automated Planning and Scheduling, els dies 24 a 29 de juny de 2018 a Delft, Pa茂sos Baixos.In this work we present a novel approach to solving concurrent multiagent planning problems in which several agents act in parallel. Our approach relies on a compilation from concurrent multiagent planning to classical planning, allowing us to use an off-the-shelf classical planner to solve the original multiagent problem. The solution can be directly interpreted as a concurrent plan that satisfies a given set of concurrency constraints, while avoiding the exponential blowup associated with concurrent actions. Theoretically, we show that the compilation is sound and complete. Empirically, we show that our compilation can solve challenging multiagent planning problems that require concurrent actions.This work has been supported by the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502)

    Solving concurrent multiagent planning using classical planning

    No full text
    Comunicaci贸 presentada al 6th Workshop on Distributed and Multi-Agent Planning (DMAP 2018), celebrat durant la 28th International Conference on Automated Planning and Scheduling, els dies 24 a 29 de juny de 2018 a Delft, Pa茂sos Baixos.In this work we present a novel approach to solving concurrent multiagent planning problems in which several agents act in parallel. Our approach relies on a compilation from concurrent multiagent planning to classical planning, allowing us to use an off-the-shelf classical planner to solve the original multiagent problem. The solution can be directly interpreted as a concurrent plan that satisfies a given set of concurrency constraints, while avoiding the exponential blowup associated with concurrent actions. Theoretically, we show that the compilation is sound and complete. Empirically, we show that our compilation can solve challenging multiagent planning problems that require concurrent actions.This work has been supported by the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502)

    CARPooL: Collective Adaptation using concuRrent PLanning

    No full text
    Comunicaci贸 presentada a la 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2018), celebrada a Stockholm del 10 al 15 de juliol de 2018.In this paper we present the CARPooL demonstrator, an implementation of a Collective Adaptation Engine (CAE) that addresses the challenge of collective adaptation in the smart mobility domain. CARPooL resolves adaptation issues via concurrent planning techniques. It also allows to interact with the provided solutions by adding new issues or analyzing the actions done by each agent.This work has been partially supported by the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502)
    corecore